Succinct Online Dictionary Matching with Improved Worst-Case Guarantees
نویسندگان
چکیده
In the online dictionary matching problem the goal is to preprocess a set of patterns D = {P1, . . . , Pd} over alphabet Σ, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size λ. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label. In this paper we introduce a succinct representation of multi-labeled trees for λ = ω(1) that support LLA queries in O(log log λ) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the online dictionary matching problem when σ = ω(1). In this solution the worst case cost per character is O(log log σ + occ) time, where occ is the size of the current output. Moreover, the amortized cost per character is O(1 + occ) time. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
An Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملFully Dynamic Almost-Maximal Matching: Breaking the Polynomial Barrier for Worst-Case Time Bounds
Despite significant research effort, the state-of-the-art algorithm for maintaining an approximate matching in fully dynamic graphs has a polynomial worst-case update time, even for very poor approximation guarantees. In a recent breakthrough, Bhattacharya, Henzinger and Nanongkai showed how to maintain a constant approximation to the minimum vertex cover, and thus also a constant-factor estima...
متن کاملDesigning smoothing functions for improved worst-case competitive ratio in online optimization
Online optimization covers problems such as online resource allocation, online bipartite matching, adwords (a central problem in e-commerce and advertising), and adwords with separable concave returns. We analyze the worst case competitive ratio of two primal-dual algorithms for a class of online convex (conic) optimization problems that contains the previous examples as special cases defined o...
متن کاملSuccinct Sampling on Streams
A streaming model is one where data items arrive over long period of time, either one item at a time or in bursts. Typical tasks include computing various statistics over a sliding window of some fixed time horizon. What makes the streaming model interesting is that as the time progresses, old items expire and new ones arrive. One of the simplest and most central tasks in this model is sampling...
متن کاملOnline Dictionary Matching for Streams of XML Documents
We consider the online multiple-pattern matching problem for streams of XML documents, when the patterns are expressed as linear XPath expressions containing child operators (/), descendant operators (//) and wildcards (∗) but no predicates. For each document in the stream, the task is to determine all occurrences in the document of all the patterns. We present a general multiple-pattern-matchi...
متن کامل